Homework 6¶

Gianni Spiga¶

STA 209, Fall 2023¶

For this assignment, I discussed theory and method of approaches with Jonas Kempf.

1 Experimenting with Newton’s Method¶

In this assignment, you will experiment with gradient descent and Newton’s method. Complete the partial python code is included in the zip file. Areas that you should fill in are marked with “TODO” comments.

You should turn in an archive containing all of your source code, and include all plots and answers to underlined questions in your problem set writeup.

1.1 Complete the Code.¶

All algorithms are implemented in “algorithms.py”. An “exact” line search algorithm (using the bisection method) is fully implemented in “algorithms.py”. You should understand the implementation of the algorithm. Feel free to reuse code from your previous assignment. In algorithms.py you need to fill in the following lines marked TODO:

  • (a) In newton(), implement the calculation of the Newton update direction and the Newton decrement
In [ ]:
import time
import numpy as np

def bisection( func, x, direction, eps=1e-9, maximum_iterations=65536 ):
    """ 
    'Exact' linesearch (using bisection method)
    func:               the function to optimize It is called as "value, gradient = func( x, 1 )
    x:                  the current iterate
    direction:          the direction along which to perform the linesearch
    eps:                the maximum allowed error in the resulting stepsize t
    maximum_iterations: the maximum allowed number of iterations
    """
    
    x = np.asarray( x )
    direction = np.asarray( direction )
    
    if eps <= 0:
        raise ValueError("Epsilon must be positive")
        
    _, gradient = func( x , 1 )
    gradient = np.asarray( gradient )
    
    # checking that the given direction is indeed a descent direciton
    if np.vdot( direction, gradient )  >= 0:
        return 0
    
    else:
        
        # setting an upper bound on the optimum.
        MIN_t = 0
        MAX_t = 1
        iterations = 0
        
        value, gradient = func( x + MAX_t * direction, 1 )
        value = np.double( value )
        gradient = np.asarray( gradient )
        
        while np.vdot( direction, gradient ) < 0:
            
            MAX_t *= 2
            
            value, gradient = func( x + MAX_t * direction, 1 )
            
            iterations += 1
            
            if iterations >= maximum_iterations:
                raise ValueError("Too many iterations")
        
        # bisection search in the interval (MIN_t, MAX_t)
        iterations = 0
        
        while True:        
        
            t = ( MAX_t + MIN_t ) / 2
            
            value, gradient = func( x + t * direction, 1 )
            value = np.double( value )
            gradient = np.asarray( gradient )
        
            suboptimality = abs( np.vdot( direction, gradient ) ) * (MAX_t - t)
            
            if suboptimality <= eps:
                break
            
            if np.vdot( direction, gradient ) < 0:
                MIN_t = t
            else:
                MAX_t = t
            
            iterations += 1
            if iterations >= maximum_iterations:
                raise ValueError("Too many iterations")
                
        return t
    
###############################################################################
def newton( func, initial_x, eps=1e-5, maximum_iterations=65536, linesearch=bisection, *linesearch_args  ):
    """ 
    Newton's Method
    func:               the function to optimize It is called as "value, gradient, hessian = func( x, 2 )
    initial_x:          the starting point
    eps:                the maximum allowed error in the resulting stepsize t
    maximum_iterations: the maximum allowed number of iterations
    linesearch:         the linesearch routine
    *linesearch_args:   the extra arguments of linesearch routine
    """
    
    if eps <= 0:
        raise ValueError("Epsilon must be positive")
    x = np.asarray( initial_x.copy() )
    
    # initialization
    values = []
    runtimes = []
    xs = []
    start_time = time.time()
    iterations = 0
    
    # newton's method updates
    while True:
        
        value, gradient, hessian = func( x , 2 )
        value = np.double( value )
        gradient = np.matrix( gradient )
        hessian = np.matrix( hessian ) 
        
        # updating the logs
        values.append( value )
        runtimes.append( time.time() - start_time )
        xs.append( x.copy() )

        ### TODO: Compute the Newton update direction
        direction = -np.linalg.inv(hessian) @ gradient

        ### TODO: Compute the Newton decrement
        newton_decrement = np.sqrt(gradient.T @ np.linalg.inv(hessian) @ gradient)
        #if (### TODO: stop criterion):
        #    break

        if newton_decrement <= eps:
            break
        
        t = linesearch( func, x, direction )

        x += t * np.asarray( direction )

        iterations += 1
        if iterations >= maximum_iterations:
            raise ValueError("Too many iterations")
    
    return (x, values, runtimes, xs)
    

1.2 Quadratic Objective¶

The python script main quadratic.py creates a contour plot for the following objective (con- tained in hw6–functions.py) :

  • (a) Draw a contour plot of the function and plot the trajectory of Newton’s method and GD

on top of that.

In [ ]:
######
######  This script compares different optimization methods on optimizating
######  a quadratic function of the form f(x) = 1/2 * x'Hx + x'b 
######
%matplotlib inline
import numpy as np
import hw6_functions as hw
import algorithms as alg

import plotly.graph_objects as go

import plotly.io as pio

# This ensures Plotly output works in multiple places:
# plotly_mimetype: VS Code notebook UI
# notebook: "Jupyter: Export to HTML" command in VS Code
# See https://plotly.com/python/renderers/#multiple-renderers
pio.renderers.default = "plotly_mimetype+notebook"


# the hessian matrix of the quadratic function
H = np.matrix('1 0; 0 30')

# the vector of linear coefficient of the quadratic function
b = np.matrix('0; 0')

# Choose the quadratic objective. This notation defines a "closure", returning
# an oracle function which takes x (and order) as its only parameter, and calls
# obj.quadratic with parameters H and b defined above, and the parameters of 
# the closure (x and order)
func = lambda x, order: hw.quadratic( H, b, x, order )


# Start at (4,0.3), with an identity inverse Hessian
initial_x = np.matrix('4; 0.3')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536

# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent(func, initial_x, eps, maximum_iterations, alg.bisection )

x, values, runtimes, newton_xs = alg.newton(func, initial_x, eps, maximum_iterations, alg.bisection )


#Z = hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)

#fig = go.Figure(data = 
#    go.Contour(
#        z = Z,
#        contours_coloring = 'lines',
#        line_width = 2,
#    ))

#fig.show()
hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)
  • (b) How do you explain the number of iterations in Newton’s method?

    Since Newton's method takes both information from the Gradient and the Hessian, we are able to converge in only one iteration to the minimum. With the gradient, since we have less information about our function, we need 55 iterations to converge.

  • (c) Play around with different Hessian matrices, and explore the impact of various choices on

the performance of gradient descent. Describe your results.

In [ ]:
# the hessian matrix of the quadratic function
H = np.matrix('100 0; 0 300')

# the vector of linear coefficient of the quadratic function
b = np.matrix('0; 0')

func = lambda x, order: hw.quadratic( H, b, x, order )

# Start at (4,0.3), with an identity inverse Hessian
initial_x = np.matrix('4; 0.3')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536

# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent(func, initial_x, eps, maximum_iterations, alg.bisection )

x, values, runtimes, newton_xs = alg.newton(func, initial_x, eps, maximum_iterations, alg.bisection )

hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)
In [ ]:
# the hessian matrix of the quadratic function
H = np.matrix('8 0; 0 5')

# the vector of linear coefficient of the quadratic function
b = np.matrix('0; 0')

func = lambda x, order: hw.quadratic( H, b, x, order )

# Start at (4,0.3), with an identity inverse Hessian
initial_x = np.matrix('4; 0.3')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536

# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent(func, initial_x, eps, maximum_iterations, alg.bisection )

x, values, runtimes, newton_xs = alg.newton(func, initial_x, eps, maximum_iterations, alg.bisection )

hw.draw_contour(func, gd_xs=gd_xs, newton_xs=newton_xs)

With Hessians that have diagonals more similar in value, the gradient descent and Newton method have similar iterations. However, the larger scale of the Hessian, Newtons method performs consistently quick while GD takes slower.

1.3 Sum-Exp Objective¶

The python script main–sum–exp.py creates a contour plot for the following objective (con- tained in hw6–functions.py): $$ f (x) = \sum_i \exp (a^T_i x + b_i) $$ for some $a_i$ and $b_i$’s.

  • (a) Draw a contour plot of the function and plot the trajectory of Newton’s method and GD

on top of that.

In [ ]:
######
######  This script compares different optimization methods on optimizating
######  a sum-exp Objective of the form sum_i exp( < a_i , x > + b_i )
######

%matplotlib inline
import numpy as np
import hw6_functions as hw
import algorithms as alg


# a is the vector of coefficients
a = np.matrix('1 3; 1 -3; -1 0')
# b is the vector of bias terms
b = np.matrix('-0.1; -0.1; -0.1')

# Choose the sum-exp objective. This notation defines a "closure", returning
# an oracle function which takes x (and order) as its only parameter, and calls
# obj.sum_exp with parameters a and b defined above, and the parameters of 
# the closure (x and order)
func = lambda x, order: hw.sum_exp( a, b, x, order )

# Use backtracking line search. This defines a closure which takes three
# parameters. Note that the alpha and beta parameters to the backtracking line
# search are 0.4 and 0.9, respectively
linesearch = lambda f, x, direction: alg.backtracking( f, x, direction, 0.4, 0.9 ) 

# Start at (-1,1.5), with (for BFGS) an identity inverse Hessian
initial_x = np.matrix('-1; 1.5')
initial_inverse_hessian = np.eye( 2 )
# Find the (1e-4)-suboptimal solution
eps = 1e-4
maximum_iterations = 65536

# Setting the backtracking line search parameters
alpha = 0.4
beta = 0.9

# Run the algorithms
x, values, runtimes, gd_xs = alg.gradient_descent( func, initial_x, eps, maximum_iterations, alg.backtracking, alpha, beta )

x, values, runtimes, newton_xs = alg.newton( func, initial_x, eps, maximum_iterations, alg.backtracking, alpha, beta )

# Draw contour plots
hw.draw_contour( func, gd_xs, newton_xs, levels=np.arange(5, 1000, 10), x=np.arange(-3, 1.04, 0.04), y=np.arange(-2, 2.04, 0.04) )

2 Implementation and comparison of Acceleration¶

Chapter 4: Problem 2, and 3. You can code these problems in Matlab or Python. No need to do part (c) of Problem 2

In [ ]:
import numpy as np

# Parameters
# Textbook calls m "mu"
mu = 0.01 
L = 1
kappa = L / mu
n = 100

# Generating random matrix A with specific eigenvalues
A = np.random.randn(n, n)
Q, R = np.linalg.qr(A)  # QR decomposition

D = np.random.rand(n, 1)
D = 10 ** D
Dmin = np.min(D)
Dmax = np.max(D)
D = (D - Dmin) / (Dmax - Dmin)
D = mu + D * (L - mu)

A = Q.T @ np.diag(D.ravel()) @ Q

# Tolerance and maximum iterations
epsilon = 1e-6
kmax = 1000

def steepest_descent(A, x0, alpha = 0.01, epsilon=1e-6, kmax=1000, linesearch = False):
    # This function can do custom alpha or exact line search
    # If exact linesearch is desired, alpha is arbitrary 
    x = x0.copy()
    f_values = []

    for k in range(kmax):
        gradient = A @ x

        if linesearch == True:
            alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)

        # Update step
        x = x - alpha * gradient
        f_value = 0.5 * x.T @ A @ x
        f_values.append(f_value.item())

        if np.linalg.norm(gradient) < epsilon:
            break

    return x, f_values, k

def steepest_descent_exact_line_search(A, x0, epsilon=1e-6, kmax=1000):
    x = x0.copy()
    for k in range(kmax):
        gradient = A @ x

        alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)

        # Update x        # Compute the optimal step size for the current gradient

        x = x - alpha * gradient

        if np.linalg.norm(gradient) < epsilon:
            break

    return k   # Returning the number of iterations

def heavy_ball_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
    x = x0.copy()

    f_values = [] 
    x_prev = np.zeros_like(x0)  # Initialize x_{k-1}
    for k in range(kmax):
        gradient = A @ x

        # Heavy-ball update
        x_next = x - alpha * gradient + beta * (x - x_prev)
        x_prev = x
        x = x_next

        f_value = 0.5 * x.T @ A @ x
        f_values.append(f_value.item())

        if np.linalg.norm(gradient) < epsilon:
            break

    return x, f_values, k 

def nesterovs_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
    x = x0.copy()
    #x_prev = np.zeros_like(x0)  # Initialize x_{k-1}
    y = x.copy()

    f_values = [] 
    for k in range(kmax):
        gradient = A @ y

        # Nesterov's update
        #y = x_prev - (1/beta)*gradient

        x_next = y - alpha * gradient
        y = x_next + beta * (x_next - x)
        #x_prev = x
        x = x_next

        f_value = 0.5 * x.T @ A @ x
        f_values.append(f_value.item())

        if np.linalg.norm(gradient) < epsilon:
            break

    return x, f_values, k

num_trials = 10
total_iterations = []

# Perform optimization for each trial
# This is step size 2/(m + L)
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_1, f_values_1, iterations_1 = steepest_descent(A, x0, 2/(m + L), linesearch= False)
    total_iterations.append(iterations_1)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 2/(m + L):", average_iterations)
#print("My x_1 is ", optimal_x_1)
#print(f_values_1)

#  Step size 1/L
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_2, f_values_2, iterations_2 = steepest_descent(A, x0, 1/L)
    total_iterations.append(iterations_2)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 1/L:", average_iterations)


# Now with Linesearch
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_3, f_values_3, iterations = steepest_descent(A, x0, linesearch=True)
    total_iterations.append(iterations)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for exact linesearch:", average_iterations)


# Now we roll with the heavyball  
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_4, f_values_4, iterations = heavy_ball_method(A, x0, alpha = 4 / (np.sqrt(L) + np.sqrt(m))**2, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
    total_iterations.append(iterations)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Heavy-Ball Method:", average_iterations)

# And finally, Nesterovs 
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_5, f_values_5, iterations = nesterovs_method(A, x0, alpha = 1/L, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
    total_iterations.append(iterations)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Nesterov's Optimal Method:", average_iterations)

# Check for positive definite
#np.all(np.linalg.eigvals(A) > 0)
Average number of iterations over 10 trials for step size 2/(m + L): 666.8
Average number of iterations over 10 trials for step size 1/L: 783.2
Average number of iterations over 10 trials for exact linesearch: 674.4
Average number of iterations over 10 trials for Heavy-Ball Method: 543.475
Average number of iterations over 10 trials for Nesterov's Optimal Method: 456.56

Draw a plot of the convergence behavior on a typical run, plotting iteration number against $log_{10}(f (xk) - f (x∗))$. (Use the same figure, with different colors for each algorithm.)

In [ ]:
import plotly.express as px

# Perform steepdest descent
#optimal_x, f_values, arb = steepest_descent(A, x0, 2/(m + L))



# Compute optimal function value for comparison
f_star_1 = 1/2 * optimal_x_1.T @ A @ optimal_x_1 # func(A, optimal_x)
f_star_1 = f_star_1.item() # f-star is double nested matrix

f_star_2 = 1/2 * optimal_x_2.T @ A @ optimal_x_2 
f_star_2 = f_star_2.item() 

# Exact Linesearch
f_star_3 = 1/2 * optimal_x_3.T @ A @ optimal_x_3  
f_star_3 = f_star_3.item()

# Heavy Ball
f_star_4 = 1/2 * optimal_x_4.T @ A @ optimal_x_4  
f_star_4 = f_star_4.item()

# Nesterov
f_star_5 = 1/2 * optimal_x_5.T @ A @ optimal_x_5  
f_star_5 = f_star_5.item()

# Adding a small positive value to prevent dividing by zero error
log_diff_1 = [np.log10(max(val - f_star_1, 1e-15)) for val in f_values_1]  
log_diff_2 = [np.log10(max(val - f_star_2, 1e-15)) for val in f_values_2] 
log_diff_3 = [np.log10(max(val - f_star_3, 1e-15)) for val in f_values_3]
log_diff_4 = [np.log10(max(val - f_star_4, 1e-15)) for val in f_values_4]
log_diff_5 = [np.log10(max(val - f_star_5, 1e-15)) for val in f_values_5]

print(f_star_5)

# Plot Hack
first_label = np.repeat("alpha = 2/(m + L)", len(f_values_1))

# Plotting with Plotly
fig = px.line(x=list(range(len(f_values_1))), y=log_diff_1, color = first_label, labels={'x': 'Iteration', 'y': 'log10(f(x_k) - f(x*))', "color": "Algorithm"}, 
              title='Convergence of Different Methods')
fig.add_trace(go.Line(x=list(range(len(f_values_2))), y=log_diff_2, name= "alpha = 1/L"))
fig.add_trace(go.Line(x=list(range(len(f_values_3))), y=log_diff_3, name= "Exact Linesearch"))
fig.add_trace(go.Line(x=list(range(len(f_values_4))), y=log_diff_4, name= "Heavy Ball Method"))
fig.add_trace(go.Line(x=list(range(len(f_values_5))), y=log_diff_5, name= "Nesterov's Method"))
fig.show()
4.035820934655251e-11
c:\Users\Gianni\anaconda3\lib\site-packages\plotly\graph_objs\_deprecations.py:378: DeprecationWarning:

plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.


We can see from the above plot the Nesterov's Accelerated Method performs the best, followed by the Heavy Ball method. The slowest is the steepest descent with step size $\alpha = \frac{1}{L}$.

Problem 3¶

In [ ]:
import numpy as np

# Parameters
# Textbook calls m "mu"
mu = 0
L = 1
n = 100

# Generating random matrix A with specific eigenvalues
A = np.random.randn(n, n)
Q, R = np.linalg.qr(A)  # QR decomposition

D = np.random.rand(n, 1)
D = 10 ** D
Dmin = np.min(D)
Dmax = np.max(D)
D = (D - Dmin) / (Dmax - Dmin)
D = mu + D * (L - mu)

A = Q.T @ np.diag(D.ravel()) @ Q

# Tolerance and maximum iterations
epsilon = 1e-6
kmax = 1000

def steepest_descent(A, x0, alpha = 0.01, epsilon=1e-6, kmax=1000, linesearch = False):
    # This function can do custom alpha or exact line search
    # If exact linesearch is desired, alpha is arbitrary 
    x = x0.copy()
    f_values = []

    for k in range(kmax):
        gradient = A @ x

        if linesearch == True:
            alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)

        # Update step
        x = x - alpha * gradient
        f_value = 0.5 * x.T @ A @ x
        f_values.append(f_value.item())

        if np.linalg.norm(gradient) < epsilon:
            break

    return x, f_values, k

def steepest_descent_exact_line_search(A, x0, epsilon=1e-6, kmax=1000):
    x = x0.copy()
    for k in range(kmax):
        gradient = A @ x

        alpha = (gradient.T @ gradient) / (gradient.T @ A @ gradient)

        # Update x        # Compute the optimal step size for the current gradient

        x = x - alpha * gradient

        if np.linalg.norm(gradient) < epsilon:
            break

    return k   # Returning the number of iterations

def heavy_ball_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
    x = x0.copy()

    f_values = [] 
    x_prev = np.zeros_like(x0)  # Initialize x_{k-1}
    for k in range(kmax):
        gradient = A @ x

        # Heavy-ball update
        x_next = x - alpha * gradient + beta * (x - x_prev)
        x_prev = x
        x = x_next

        f_value = 0.5 * x.T @ A @ x
        f_values.append(f_value.item())

        if np.linalg.norm(gradient) < epsilon:
            break

    return x, f_values, k 

def nesterovs_method(A, x0, alpha, beta, epsilon=1e-6, kmax=1000):
    x = x0.copy()
    #x_prev = np.zeros_like(x0)  # Initialize x_{k-1}
    y = x.copy()

    f_values = [] 
    for k in range(kmax):
        gradient = A @ y

        # Nesterov's update
        #y = x_prev - (1/beta)*gradient

        x_next = y - alpha * gradient
        y = x_next + beta * (x_next - x)
        #x_prev = x
        x = x_next

        f_value = 0.5 * x.T @ A @ x
        f_values.append(f_value.item())

        if np.linalg.norm(gradient) < epsilon:
            break

    return x, f_values, k

num_trials = 10
total_iterations = []

# Perform optimization for each trial
# This is step size 2/(m + L)
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_1, f_values_1, iterations_1 = steepest_descent(A, x0, 2/(m + L), linesearch= False)
    total_iterations.append(iterations_1)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 2/(m + L):", average_iterations)
#print("My x_1 is ", optimal_x_1)
#print(f_values_1)

#  Step size 1/L
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_2, f_values_2, iterations_2 = steepest_descent(A, x0, 1/L)
    total_iterations.append(iterations_2)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for step size 1/L:", average_iterations)


# Now with Linesearch
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_3, f_values_3, iterations = steepest_descent(A, x0, linesearch=True)
    total_iterations.append(iterations)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for exact linesearch:", average_iterations)


# Now we roll with the heavyball  
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_4, f_values_4, iterations = heavy_ball_method(A, x0, alpha = 4 / (np.sqrt(L) + np.sqrt(m))**2, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
    total_iterations.append(iterations)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Heavy-Ball Method:", average_iterations)

# And finally, Nesterovs 
for trial in range(num_trials):
    # Initial point for each trial
    x0 = np.random.randn(n, 1)

    optimal_x_5, f_values_5, iterations = nesterovs_method(A, x0, alpha = 1/L, beta = (np.sqrt(L) - np.sqrt(m)) / (np.sqrt(L) + np.sqrt(m)))
    total_iterations.append(iterations)

# Calculate the average number of iterations
average_iterations = np.mean(total_iterations) # / num_trials
print("Average number of iterations over", num_trials, "trials for Nesterov's Optimal Method:", average_iterations)

# Check for positive definite
#np.all(np.linalg.eigvals(A) > 0)
Average number of iterations over 10 trials for step size 2/(m + L): 999.0
Average number of iterations over 10 trials for step size 1/L: 999.0
Average number of iterations over 10 trials for exact linesearch: 999.0
Average number of iterations over 10 trials for Heavy-Ball Method: 801.375
Average number of iterations over 10 trials for Nesterov's Optimal Method: 772.08
In [ ]:
import plotly.express as px

# Perform steepdest descent
#optimal_x, f_values, arb = steepest_descent(A, x0, 2/(m + L))



# Compute optimal function value for comparison
f_star_1 = 1/2 * optimal_x_1.T @ A @ optimal_x_1 # func(A, optimal_x)
f_star_1 = f_star_1.item() # f-star is double nested matrix

f_star_2 = 1/2 * optimal_x_2.T @ A @ optimal_x_2 
f_star_2 = f_star_2.item() 

# Exact Linesearch
f_star_3 = 1/2 * optimal_x_3.T @ A @ optimal_x_3  
f_star_3 = f_star_3.item()

# Heavy Ball
f_star_4 = 1/2 * optimal_x_4.T @ A @ optimal_x_4  
f_star_4 = f_star_4.item()

# Nesterov
f_star_5 = 1/2 * optimal_x_5.T @ A @ optimal_x_5  
f_star_5 = f_star_5.item()

# Adding a small positive value to prevent dividing by zero error
log_diff_1 = [np.log10(max(val - f_star_1, 1e-15)) for val in f_values_1]  
log_diff_2 = [np.log10(max(val - f_star_2, 1e-15)) for val in f_values_2] 
log_diff_3 = [np.log10(max(val - f_star_3, 1e-15)) for val in f_values_3]
log_diff_4 = [np.log10(max(val - f_star_4, 1e-15)) for val in f_values_4]
log_diff_5 = [np.log10(max(val - f_star_5, 1e-15)) for val in f_values_5]

print(f_star_5)

# Plot Hack
first_label = np.repeat("alpha = 2/(m + L)", len(f_values_1))

# Plotting with Plotly
fig = px.line(x=list(range(len(f_values_1))), y=log_diff_1, color = first_label, labels={'x': 'Iteration', 'y': 'log10(f(x_k) - f(x*))', "color": "Algorithm"}, 
              title='Convergence of Different Methods with m = 0')
fig.add_trace(go.Line(x=list(range(len(f_values_2))), y=log_diff_2, name= "alpha = 1/L"))
fig.add_trace(go.Line(x=list(range(len(f_values_3))), y=log_diff_3, name= "Exact Linesearch"))
fig.add_trace(go.Line(x=list(range(len(f_values_4))), y=log_diff_4, name= "Heavy Ball Method"))
fig.add_trace(go.Line(x=list(range(len(f_values_5))), y=log_diff_5, name= "Nesterov's Method"))
fig.show()
2.647269010597991e-10
c:\Users\Gianni\anaconda3\lib\site-packages\plotly\graph_objs\_deprecations.py:378: DeprecationWarning:

plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.


From the iteration counts, we can see our convergence takes much longer now than before since $f$ is weakly convex. Along with this, we can also see from the convergence plots that all steepdescent methods perform similarly in convergence to the minimum, since now having a step size based on $m$ is arbitrary in this function.